System and method for automated funduscopic image analysis

ABSTRACT

A system and method of classifying images of pathology. An image is received, normalized, and segmented normalizing the image; into a plurality of regions; A disease vector is automatically determining for the plurality of regions with at least one classifier comprising a neural network. Each of the respective plurality of regions is automatically annotated, based on the determined disease vectors. The received image is automatically graded based on at least the annotations. The neural network is trained based on at least an expert annotation of respective regions of images, according to at least one objective classification criterion. The images may be eye images, vascular images, or funduscopic images. The disease may be a vascular disease, vasculopathy, or diabetic retinopathy, for example.

CROSS REFERENCE TO RELATED APPLICATION

The present application is a Continuation of U.S. patent applicationSer. No. 15/963,716, filed Apr. 26, 2018, now U.S. Pat. No. 10,719,936,issued Jul. 21, 2020, which claims priority from U.S. Provisional PatentApplication No. 62/490,977, filed Apr. 27, 2017, the entirety of whichare expressly incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to the field of automated funduscopicimage analysis using neural networks (“NN”) or deep neural networks(“DNN”).

BACKGROUND OF THE INVENTION

Each of the references cited herein are expressly incorporated herein byreference in their entirety.

The human retina is readily photographed using a variety of commonlyavailable specialist cameras. In the UK most opticians, in particularthe major chains, have such cameras in their practices already and aretinal photograph is increasingly offered as part of a standard eyetest.

Retinal images may be used to assess many eye diseases—or in generalretinopathy—and over recent years as the resolution of digital camerashave increased, has become the preferred way to assess retinal health.There are many types of retinopathy which can be detected on the image,from hemorrhages to tumors, and although many such diseases can lead toserious consequences (including blindness), the early stages usuallycause no symptoms a patient can detect. As early intervention is oftencritical to successful management and/or cure, there are a number ofcompelling public health reasons to make assessing retinal images moreroutine.

Type II diabetes is a particularly relevant disease, as one of itsconsequences is the development small bleeds on the retina(micro-aneurysms). If left untreated, these can develop rapidly intoserious eye disease, possibly before the patient has been diagnosed as atype II diabetic.

Diabetic retinopathy (“DR”) is the leading cause of adult blindnessworldwide, including developed countries like the UK. Consequently, theUK-NHS offers all type II diabetes patients a free retinal screen everyyear, where a digital photo is taken and passed to a grading center andtrained graders (humans) examine the image in minute detail for theearliest signs of bleeding. This manual process is costly, but to dateis the most reliable way to process these images. Despite years ofresearch, no automated system has yet been able to demonstrate thelevels of reliability and accuracy a trained human can achieve.

U.S. 20170039689 discloses “deep learning” technologies for assessing DRfrom images. See also, U.S. 20170039412, 20150110348, 20150110368,20150110370, 20150110372, 9,008,391, 9,002,085, 8,885,901, 8,879,813,and 20170046616. Retinal images from a funduscope provide an indicationof vascular conditions, and may be useful for diagnosing both eyedisease and more generally vascular disease.

“Deep learning” is a refinement of artificial NN (“ANN”), consisting ofmore than one hidden layer, with that permits higher levels ofabstraction and improved predictions from data [2]. See, Greenspan,Hayit, Bram van Ginneken, and Ronald M. Summers. “Guest editorial deeplearning in medical imaging: Overview and future promise of an excitingnew technique.” IEEE Transactions on Medical Imaging 35.5 (2016):1153-1159. Convolutional neural networks (“CNN”) are powerful tools forcomputer vision tasks. Deep CNNs can be formulated to automaticallylearn mid-level and high-level abstractions obtained from raw data(e.g., images). Generic descriptors extracted from CNNs are effective inobject recognition and localization in natural images.

In machine learning, two paradigms, supervised learning and unsupervisedlearning. In supervised learning, the training data is labelled, thatis, there is an extrinsic truth, such that statistical learning istethered to an external standard. In unsupervised learning, the trainingdata is analyzed without reference to a ground truth, and therefore, thefeatures are intrinsic. In semi-supervised learning techniques, aspectsof both paradigms are employed, that is, not all samples (or attributesof samples) of the training data set are labelled.

In medical imaging, the accurate diagnosis and/or assessment of adisease depends on both image acquisition and image interpretation.Image acquisition has improved substantially over recent years, withdevices acquiring data at faster rates and increased resolution. Thepresent technology therefore addresses the image interpretation process.Most interpretations of medical images are performed by physicians;however, image interpretation by humans is limited due to itssubjectivity, large variations across interpreters, and fatigue.

Human examiners may be inconsistent in their interpretations, and proneto error. Further, different humans may have different standards.Therefore, even obtaining training data is not without difficulty.

Many diagnostic tasks require an initial search process to detectabnormalities, and to quantify measurements and changes over time.

CNNs have been applied to medical image processing, see Sahiner et al.[4]. ROIs containing either biopsy-proven masses or normal tissues wereextracted from mammograms. The CNN consisted of an input layer, twohidden layers and an output layer and used backpropagation for training.CNNs have also been applied to lung nodule detection [5], andmicrocalcifications on mammography [6].

The typical CNN architecture for image processing consists of a seriesof layers of convolution filters, interspersed with a series of datareduction or pooling layers. The convolution filters are applied tosmall patches of the input image. Like the low-level vision processingin the human brain, the convolution filters detect increasingly morerelevant image features, for example lines or circles that may representstraight edges (such as for organ detection) or circles (such as forround objects like colonic polyps), and then higher order features likelocal and global shape and texture. The output of the CNN is typicallyone or more probabilities or class labels. The convolution filters arelearned from training data. This is desirable because it reduces thenecessity of the time-consuming hand-crafting of features that wouldotherwise be required to pre-process the images withapplication-specific filters or by calculating computable features.There are other network architecture variants, such as a deep recurrentneural network (“DRNN”) known as long short-term memory [7].

CNNs are parallelizable algorithms, and acceleration of processing(approximately 40 times) is enabled by graphics processing unit (GPU)computer chips, compared to CPU processing alone. See [8]. In medicalimage processing, GPUs are usable for segmentation, reconstruction,registration, and machine learning [9], [10].

Training a deep CNN (“DCNN”) from scratch (or full training) is achallenge. First, CNNs require a large amount of labeled training data,a requirement that may be difficult to meet in the medical domain whereexpert annotation is expensive and the diseases (e.g., lesions) arescarce. Second, training a DCNN requires large computational and memoryresources, without which, the training process would be extremelytime-consuming. Third, training a DCNN is often complicated byoverfitting and convergence issues, which often require repetitiveadjustments in the architecture or learning parameters of the network toensure that all layers are learning with comparable speed. Given thesedifficulties, several new learning schemes, termed “transfer learning”and “fine-tuning”, are shown to provide solutions and are increasinglygaining popularity. However, empirical adjustments to the processcharacterize the state of the art, with significant differences inapproach, implementation, and results are observed in the art even whenusing similar algorithms for similar goals. Seewww.kaggle.com/c/diabetic-retinopathy-detection.

Note that the training of a network cannot be updated, and any change inthe algorithm or training data requires re-optimization of the entirenetwork. A new technique provides some mitigation of this constraint,See, U.S. Pat. No. 9,053,431, expressly incorporated herein byreference. Thus, the network may be supplemented after definition, basedon a noise or error vector output of the network.

Computer-aided detection (CAD) is a well-established area of medicalimage analysis that is highly amenable to deep learning. In the standardapproach to CAD [11] candidate lesions are detected, either bysupervised methods or by classical image processing techniques such asfiltering and mathematical morphology. Candidate lesions are oftensegmented, and described by an often large set of hand-crafted features.A classifier is used to map the feature vectors to the probability thatthe candidate is an actual lesion. The straightforward way to employdeep learning instead of hand-crafted features is to train a CNNoperating on a patch of image data centered on the candidate lesion. Acombination of different CNNs is used to classify each candidate.

Various works focus on the supervised CNNs in order to achievecategorization. Such networks are important for many applications,including detection, segmentation and labelling. Other works focus onunsupervised schemes which are mostly shown to be useful in imageencoding, efficient image representation schemes and as a pre-processingstep for further supervised schemes. Unsupervised representationlearning methods such as Restricted Boltzmann Machines (RBM) mayoutperform standard filter banks because they learn a featuredescription directly from the training data. The RBM is trained with agenerative learning objective; this enables the network to learnrepresentations from unlabeled data, but does not necessarily producefeatures that are optimal for classification. Van Tulder et al., [18]conducted an investigation to combine the advantages of both generativeand a discriminative learning objectives in a convolutionalclassification restricted Boltzmann machine, which learns filters thatare good both for describing the training data and for classification.It is shown that a combination of learning objectives outperforms purelydiscriminative or generative learning.

CNNs enable learning data-driven, highly representative, layeredhierarchical image features. These features have been demonstrated to bea very strong and robust representation in many application domains, aspresented in this issue. In order to provide such a rich representationand successful classification, sufficient training data are needed.

When sufficient data are not available, there are several ways toproceed:

Transfer learning: CNN models (supervised) pre-trained from naturalimage dataset or from a different medical domain are used for a newmedical task at hand. In one scheme, a pre-trained CNN is applied to aninput image and then the outputs are extracted from layers of thenetwork. The extracted outputs are considered features and are used totrain a separate pattern classifier. For instance, in Bar et al. [25],[26] pre-trained CNNs were used as a feature generator for chestpathology identification. In Ginneken et al. [27] integration ofCNN-based features with handcrafted features enabled improvedperformance in a nodule detection system.

Fine Tuning: When a medium sized dataset does exist for the task athand, one suggested scheme is to use a pre-trained CNN as initializationof the network, following which further supervised training isconducted, of several (or all) the network layers, using the new datafor the task at hand. Transfer learning and fine tuning are keycomponents in the use of DCNNs in medical imaging applications.

Shin et al. [17] and Tajbakhsh et al. [28] show that using pre-trainedCNNs with fine-tuning achieved the strongest results, and that deepfine-tuning led to improved performance over shallow fine-tuning, andthe importance of using fine-tuning increases with reduced size trainingsets, regardless of specific applications domain (Tajbakhsh et al.), andfor all network architectures (Shin et al. [17]). The GoogLeNetarchitecture led to state-of-the-art detection of mediastinal lymphnodes compared to other less deep architectures, see Shin et al.

The lack of publicly available ground-truth data, and the difficulty incollecting such data per medical task, both cost-wise as well astime-wise, is a prohibitively limiting factor in the medical domain.Though crowdsourcing has enabled annotation of large scale databases forreal world images, its application for biomedical purposes requires adeeper understanding and hence, more precise definition of the actualannotation task Nguyen et al. [29], McKenna et al. [30]. The fact thatexpert tasks are being outsourced to non-expert users may lead to noisyannotations introducing disagreement between users. Likewise, use ofuncontrolled clinical outcome data, or user data as a basis forfeedback, may lead to poor outcomes, due to lack of standardization andcontrol. Many issues arise in combining the knowledge of medical expertswith non-professionals, such as how to combine the information sources,how to assess and incorporate the inputs weighted by their prior-provedaccuracy in performance and more. Albarqouni et al. [31] present anetwork that combines an aggregation layer that is integrated into theCNN to enable learning inputs from the crowds as part of the networklearning process. Results shown give valuable insights into thefunctionality of DCNN learning from crowd annotations. Crowdsourcingstudies in the medical domain show that a crowd of nonprofessional,inexperienced users can in fact perform as well as the medical experts,which was observed by Nguyen et al. [29] and McKenna et al. [30] forradiology images. This perhaps means that the results do not capture andexploit the skill, knowledge and insight of the experts.

Unsupervised feature learning for mammography risk scoring is presentedin Kallenberg et al. [32]. In this work, a method is shown that learns afeature hierarchy from unlabeled data. The learned features are theninput to a simple classifier, and two different tasks are addressed: i)breast density segmentation, and ii) scoring of mammographic texture,with state-of-the-art results achieved. To control the model capacity, asparsity regularizer is introduced that incorporates both lifetime andpopulation sparsity. The convolutional layers in the unsupervised partsare trained as autoencoders; In the supervised part the (pre-trained)weights and bias terms are fine-tuned using softmax regression.

Yan et al. [33] design a multi-stage deep learning framework for imageclassification and apply it on body part recognition. In the pre-trainstage, a CNN is trained using multi-instance learning to extract themost discriminative and non-informative local patches from the trainingslices. In the boosting stage, the pre-trained CNN is further boosted bythese local patches for image classification. A hallmark of the methodwas that it automatically discovered the discriminative andnon-informative local patches through multi-instance deep learning.Thus, no manual annotation was required.

Regression networks are not very common in the medical imaging domain.In Miao et al. [34], a CNN regression approach is presented, forreal-time 2-D/3-D registration. Three algorithmic strategies areproposed to simplify the underlying mapping to be regressed, and todesign a CNN regression model with strong non-linear modelling. Resultsshow that the discriminative local (DL) method is more accurate androbust than two state-of-the-art accelerated intensity-based 2-D/3-Dregistration methods.

Golkov et al. [35] provide an initial proof-of-concept, applying DL toreduce diffusion MRI data processing to a single optimized step. Theyshow that this modification enables one to obtain scalar measures fromadvanced models at twelve-fold reduced scan time and to detectabnormalities without using diffusion models. The relationship betweenthe diffusion-weighted signal and microstructural tissue properties isnon-trivial. Golkov et al. [35] demonstrate that with the use of a DNNsuch relationships may in fact be revealed: DWIs are directly used asinputs rather than using scalar measures obtained from model fitting.The work shows microstructure prediction on a voxel-by-voxel basis aswell as automated model-free segmentation from DWI values, into healthytissues and MS lesions. Diffusion kurtosis is shown to be measured fromonly 12 data points and neurite orientation dispersion and densitymeasures from only 8 data points. This may allow for fast and robustprotocols facilitating clinical routine and demonstrates how classicaldata processing can be streamlined by means of deep learning.

Kaggle (www.kaggle.org) organized a competition on detection and stagingof DR from color fundus images, and around 80,000 images were madeavailable (www.kaggle.com/c/diabetic-retinopathy-detection). Mayproposals used NN.

U.S. 2014/0314288 discloses a three stage system that analyzes fundusimages with varying illumination and fields of view and generates aseverity grade for DR. Image pre-processing includes histogramequalization, contrast enhancement, and dynamic range normalization. Inthe first stage, bright and red regions are extracted from the fundusimage using various combination of global filters and thresholdingtechniques. An optic disc (OD) has similar structural appearance asbright lesions, and the blood vessel regions have similar pixelintensity properties as the red lesions. Hence, the region correspondingto the optic disc is removed from the bright regions using existingoptic disc detection algorithms and the regions corresponding to theblood vessels are removed from the red regions using a simple detectiontechnique. This leads to an image containing bright candidate regionsand another image containing red candidate regions. Region-basedfeatures are computed, including area, perimeter, solidity, min, mas,mean, standard deviation, etc.; in all, 30 features are used selected byAdaBoost (en.wikipedia.org/wiki/AdaBoost). In the second stage, thebright and red candidate regions are subjected to two-step hierarchicalclassification. In the first step, bright and red lesion regions areseparated from non-lesion regions based on the features. In the secondstep, the classified bright lesion regions are further classified ashard exudates or cotton-wool spots, while the classified red lesionregions are further classified as hemorrhages and micro-aneurysms. Theclassifier may take the form of GMM(en.wikipedia.org/wiki/Mixture_model#Gaussian_mixture_model), kNN(en.wikipedia.org/wiki/K-nearest_neighbors_algorithm), SVM(en.wikipedia.org/wiki/Support_vector_machine,en.wikipedia.org/wiki/Supervised_learning), etc. In the third stage, thenumbers of bright and red lesions per image are combined to generate aDR severity grade. Such a system aims in reducing the number of patientsrequiring manual assessment, and in prioritizing eye-care deliverymeasures for patients with highest DR severity.

U.S. Pat. No. 8,098,907 discloses automatic detection of micro-aneurysms(MAs) by also taking into account information such as location ofvessels, optic disc and hard exudates (HEs). The method works by (i)dividing the image into subregions of fixed size, followed by regionenhancement/normalization and adaptive analysis, (ii) optic disc,vessel, and HE detection, and (iii) combination of the results of (i)and (ii) above. MA detection is based on top hat filtering and localadaptive thresholding.

U.S. 2015/0104087 discloses automated fundus image side, i.e. left orright eye detection, field detection and quality assessment. Thetechnology uses physiological characteristics such as location of opticdisc and macula and the presence of symmetry in the retinal blood vesselstructure. Field and side detection are multi-channel or single channel.The algorithm detects the optic disc using normalized 2Dcross-correlation using an optic disc template. For high-resolutionimages, blood vessel structure density is used to determine side andfield. The quality for each image is assessed by analysis of the vesselsymmetry in the image. This is done by obtaining a binary segmentedimage of the blood vessels, extracting features and applying aclassification algorithm. Quality is a grade of 1 to 5. For vesselbinarization, wavelets and edge location refinement are proposed. Thisis followed by morphological operations. For feature extraction, theimage is divided into 20 rectangular windows, and local vessel density(LVD) is computed as the number of non-zero pixels in window. The 20LVDs are normalized by a global vessel density (GVD) computed as thenumber of non-zero pixels for a segmented binary image of grade 5. Afeature vector is formed by the LVDs and GVD of the image. Forclassification, SVM is proposed. Image registration and similartechniques to the above can be used to assign quality levels tooverlapping stitched images.

U.S. Pat. Nos. 8,879,813, 8,885,901, 9,002,085, 9,008,391, U.S.2015/0110348, U.S. 2015/0110368, U.S. 2015/0110370, U.S. 2015/0110372,U.S. 2017/0039412, and U.S. 2017/0039689 disclose various aspects of adiagnostic system. A retinal image may be enhanced based on mediannormalization, to locally enhance the image at each pixel location usinglocal background estimation. Active pixels are detected in retinaimages, based on median filtering/dilation/erosion/etc. Essentially,this detects the retina disc and eliminates pixels close to the border.Regions of interest are detected. Descriptors of local regions areextracted, e.g., by computing two morphologically filtered images withthe morphological filter computed over geometric shaped local regions oftwo different types or sizes, and taking their difference. Image qualityis assessed using computer vision techniques to assess appropriatenessfor grading. Images are automatically screened for diseases. Image-basedlesion biomarkers are automatically analyzed, over different visits of apatient, with image registration between visits. Changes in lesions andanatomical structures are computed, and quantified it terms ofstatistics wherein the computed statistics represent the image-basedbiomarker that can be used for monitoring progression, early detection,and/or monitoring effectiveness of treatment therapy.

U.S. Pat. No. 8,879,813 describe automated detection of active pixels inretina images by accessing a retina image, generating two medianfiltered versions with different window sizes, generating a differenceimage from the filtered versions, and then generating a binary image.

U.S. Pat. No. 8,885,901 describe enhancing a retina image by accessing aretina image, filtering it with a median filter and modifying the valuesin the original image based on the values in the original and filteredimages, wherein the enhanced image is used for detecting a medicalcondition.

U.S. Pat. No. 9,002,085 and U.S. 2015/0110372 describe generatingdescriptors of local regions in a retina image by accessing a retinaimage, generating two morphologically filtered versions, the first witha circular/regular polygon window and the second with anelongated/elliptical window, generating difference values from thefiltered images and using them as pixel descriptor values for the retinaimage.

U.S. Pat. No. 9,008,391 and U.S. 2015/0110368 describe accessing retinaimages for a patient, for each of the images designating a subset ofpixels as active including regions of interest, computing pixel-leveldescriptors, providing pixel level classification from the descriptorsusing supervised learning, computing a second descriptor, and providinga second classification for a plurality of pixels using supervisedlearning.

U.S. 2015/0110348 and U.S. 2017/0039412 describe automated detection ofregions of interest (ROIs) in retina images by accessing a retina image,extracting regions with one or more desire properties using multiscalemorphological filterbank analysis, and storing a binary map.

U.S. 2015/0110370 and U.S. 2017/0039689 describe enhancing a retinalimage by accessing a funduscopic image, estimating the background atsingle of multiple scales, and scaling the intensity at a first pixellocation adaptively based on the intensity at the same position in thebackground image.

REFERENCES

-   [1] MIT Technol. Rev., 2013. Available:    www.technologyreview.com/s/513696/deep-learning-   [2] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature,    vol. 521, no. 7553, pp. 436-444, 2015.-   [3] J. Schmidhuber, “Deep learning in neural networks: An overview,”    Neural Netw., vol. 61, pp. 85-117, 2015.-   [4] B. Sahiner et al., “Classification of mass and normal breast    tissue: A convolution neural network classifier with spatial domain    and texture images,” IEEE Trans. Med. Imag., vol. 15, no. 5, pp.    598-610, Oct. 1996.-   [5] S. C. B. Lo, J. S. J. Lin, M. T. Freedman, and S. K. Mun,    “Computerassisted diagnosis of lung nodule detection using    artificial convolution neural-network,” Proc. SPIE Med. Imag., Image    Process., vol. 1898, pp. 859-869, 1993.-   [6] H.-P. Chan, S.-C. Lo, B. Sahiner, K. L. Lam, and M. A. Helvie,    “Computer-aided detection of mammographic microcalcifications:    Pattern recognition with an artificial neural network,” Med. Phys.,    vol. 22, no. 10, pp. 1555-67, 1995.-   [7] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”    Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.-   [8] G. E. Hinton, S. Osindero, and Y. W. Teh, “A fast learning    algorithm for deep belief nets,” Neural Comput., vol. 18, no. 7, pp.    1527-1554, 2006.-   [9] D. Castano-Diez, D. Moser, A. Schoenegger, S. Pruggnaller,    and A. S. Frangakis, “Performance evaluation of image processing    algorithms on the GPU,” J. Struct. Biol., vol. 164, no. 1, pp.    153-160, 2008.-   [10] A. Eklund, P. Dufort, D. Forsberg, and S. M. LaConte, “Medical    image processing on the GPU-Past, present and future,” Med. Image    Anal., vol. 17, no. 8, pp. 1073-94, 2013.-   [11] B. van Ginneken, C. M. Schaefer-Prokop, and M. Prokop,    “Computeraided diagnosis: How to move from the laboratory to the    clinic,” Radiol., vol. 261, no. 3, pp. 719-732, 2011.-   [12] A. Setio et al., “Pulmonary nodule detection in CT images using    multiview convolutional networks,” IEEE Trans. Med. Imag., vol. 35,    no. 5, pp. 1160-1169, May 2016.-   [13] H. Roth et al., “Improving computer-aided detection using    convolutional neural networks and random view aggregation,” IEEE    Trans. Med. Imag., vol. 35, no. 5, pp. 1170-1181, May 2016.-   [14] Q. Dou et al., “Automatic detection of cerebral microbleeds    from MR images via 3D convolutional neural networks,” IEEE Trans.    Med. Imag., vol. 35, no. 5, pp. 1182-1195, May 2016.-   [15] K. Sirinukunwattana et al., “Locality sensitive deep learning    for detection and classification of nuclei in routine colon cancer    histology images,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp.    1196-1206, May 2016.-   [16] M. Anthimopoulos, S. Christodoulidis, A. Christe, and S.    Mougiakakou, “Lung pattern classification for interstitial lung    diseases using a deep convolutional neural network,” IEEE Trans.    Med. Imag., vol. 35, no. 5, pp. 1207-1216, May 2016.-   [17] H.-C. Shin et al., “Deep convolutional neural networks for    computeraided detection: CNN architectures, dataset characteristics    and transfer learning,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp.    1285-1298, May 2016.-   [18] G. van Tulder and M. de Bruijne, “Combining generative and    discriminative representation learning in convolutional restricted    Boltzmann machines,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp.    1262-1272, May 2016.-   [19] A. Depeursinge et al., Comput. Med. Imag. Graph., vol. 36, no.    3, pp. 227-238, 2012.-   [20] F. Ghesu et al., “Marginal space deep learning: Efficient    architecture for volumetric image parsing,” IEEE Trans. Med. Imag.,    vol. 35, no. 5, pp. 1217-1228, May 2016.-   [21] T. Brosch et al., “Deep 3D convolutional encoder networks with    shortcuts for multiscale feature integration applied to multiple    sclerosis lesion segmentation,” IEEE Trans. Med. Imag., vol. 35, no.    5, pp. 1229-1239, May 2016.-   [22] S. Pereira, A. Pinto, V. Alves, and C. Silva, “Brain tumor    segmentation using convolutional neural networks in MRI images,”    IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1240-1251, May 2016.-   [23] P. Moeskops et al., “Automatic segmentation of MR brain images    with a convolutional neural network,” IEEE Trans. Med. Imag., vol.    35, no. 5, pp. 1252-1261, May 2016.-   [24] M. van Grinsven, B. van Ginneken, C. Hoyng, T. Theelen, and C.    Sanchez, “Fast convolutional neural network training using selective    data sampling: Application to hemorrhage detection in color fundus    images,” IEEE Trans. Med. Imag., vol. 35, no. 5, pp. 1273-1284, May    2016.-   [25] Y. Bar, I. Diamant, L. Wolf, and H. Greenspan, “Deep learning    with non-medical training used for chest pathology identification,”    Proc. SPIE Med. Imag. Computer-Aided Diagnosis, vol. 9414, 2015.-   [26] Y. Bar, I. Diamant, L. Wolf, S. Lieberman, E. Konen, and H.    Greenspan, “Chest pathology detection using deep learning with    non-medical training,” in Proc. IEEE 12th Int. Symp. Biomed. Imag.,    2015, pp. 294-297.-   [27] B. van Ginneken, A. A. Setio, C. Jacobs, and F. Ciompi,    “Off-the-shelf convolutional neural network features for pulmonary    nodule detection in computed tomography scans,” in Proc. IEEE 12th    Int. Symp. Biomed. Imag., 2015, pp. 286-289.-   [28] N. Tajbakhsh et al., “Convolutional neural networks for medical    image analysis: Full training or fine tuning?,” IEEE Trans. Med.    Imag., vol. 35, no. 5, pp. 1299-1312, May 2016.-   [29] T. B. Nguyen et al., “Distributed human intelligence for    colonic polyp classification in computer-aided detection for CT    colonography,” Radiology, vol. 262, no. 3, pp. 824-833, 2012.-   [30] M. T. McKenna et al., “Strategies for improved interpretation    of computer-aided detections for CT colonography utilizing    distributed human intelligence,” Med. Image Anal., no. 6, pp.    1280-1292, 2012.-   [31] S. Albarqouni, C. Baur, F. Achilles, V. Belagiannis, S.    Demirci, and N. Navab, “Agg-Net: Deep learning from crowds for    mitosis detection in breast cancer histology images,” IEEE Trans.    Med. Imag., vol. 35, no. 5, pp. 1313-1321, May 2016.-   [32] M. Kallenberg et al., “Unsupervised deep learning applied to    breast density segmentation and mammographic risk scoring,” IEEE    Trans. Med. Imag., vol. 35, no. 5, pp. 1322-1331, May 2016.-   [33] Z. Yan et al., “Multi-instance deep learning: Discover    discriminative local anatomies for bodypart recognition,” IEEE    Trans. Med. Imag., vol. 35, no. 5, pp. 1332-1343, May 2016.-   [34] S. Miao, Z. J. Wang, and R. Liao, “A CNN regression approach    for real-time 2D/3D registration,” IEEE Trans. Med. Imag., vol. 35,    no. 5, pp. 1352-1363, May 2016.-   [35] V. Golkov et al., “q-Space deep learning: Twelve-fold shorter    and model free diffusion MRI scans,” IEEE Trans. Med. Imag., vol.    35, no. 5, pp. 1344-1351, May 2016.-   [36] K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for    image recognition ArXiv, 2015, arXiv:1512.03385, to be published-   [37] H. R. Roth et al., “A new 2.5 d representation for lymph node    detection in CT,” Cancer Imag. Arch., 2015    dx.doi.org/10.7937/K9/TCIA.2015.AQIIDCNM-   [38] H. R. Roth et al., “Data from pancreas-CT,” Cancer Imag. Arch.,    2016 [Online]. Available: dx.doi.org/10.7937/K9/TCIA.2016.tNB1kqBU

See, U.S. Pat. Nos. 5,784,162; 6,088,099; 6,198,532; 6,276,798;6,419,361; 6,556,853; 6,895,264; 6,943,153; 6,992,775; 7,433,532;7,474,775; 7,712,898; 7,931,902; 8,098,907; 8,114,843; 8,194,936;8,207,396; 8,303,115; 8,309,350; 8,340,437; 8,705,826; 8,787,638;8,879,813; 8,885,901; 8,896,682; 9,002,085; 9,008,391; 9,097,707;9,545,196; 20010033364; 20020052551; 20020194630; 20030129164;20040221855; 20040254154; 20060257031; 20070128662; 20080124344;20100061601; 20100172871; 20100220906; 20100302507; 20110097330;20110173708; 20110242306; 20120190094; 20120257164; 20120272341;20130108131; 20130116670; 20130137113; 20130301889; 20140023692;20140199277; 20140276025; 20150110348; 20150110368; 20150110370;20150110372; 20150124216; 20150224193; 20150238767; 20150379708;20160120404; 20160217586; 20170039412; 20170039689; and 20170046616.

SUMMARY OF THE INVENTION

The present invention provides a system and method for automaticallypredicting presence of vascular disease based on one or more vascularimages. The images are preferably funduscopic images, and the vasculardisease would then be a retinal disease, or retinopathy. In particular,screening for DR is a particular target for application of thetechnology.

The underlying methodology is not limited to retinal images, though thevisibility of the retinal vasculature in a funduscopic image makes thisparticularly preferred. However, other vasculature and microvasculaturemay be analyzed. For example, capillaries of bucchal and sublingualmucosa, intestinal mucosa, sclera, etc., can be imaged under variouscircumstances.

Retinopathy may be diagnosed as (H35.0) Hypertensive retinopathy-burstblood vessels, due to long-term high blood pressureen.wikipedia.org/wiki/Hypertensive_retinopathy; (H35.0/E10-E14) Diabeticretinopathy-damage to the retina caused by complications of diabetesmellitus, which could eventually lead to blindnessen.wikipedia.org/wiki/Diabetic_retinopathy; (H35.0-H35.2)Retinopathy-general term referring to non-inflammatory damage to theretina en.wikipedia.org/wiki/Retinopathy; (H35.1) Retinopathy ofprematurity-scarring and retinal detachment in premature babiesen.wikipedia.org/wiki/Retinopathy_of_prematurity; (H35.3) Age-relatedmacular degeneration-the photosensitive cells in the macula malfunctionand over time cease to work; (H35.3) Macular degeneration-loss ofcentral vision, due to macular degeneration (e.g., Bull's EyeMaculopathy, chloroquine retinopathyen.wikipedia.org/wiki/Chloroquine_retinopathy)en.wikipedia.org/wiki/Macular_degeneration; (H35.3) Epiretinalmembrane-a transparent layer forms and tightens over the retinaen.wikipedia.org/wiki/Epiretinal_membrane; (H35.4) Peripheral retinaldegeneration en.wikipedia.org/wiki/Lattice_degeneration; (H35.5)Hereditary retinal dystrophyen.wikipedia.org/wiki/Progressive_retinal_atrophy; (H35.5) Retinitispigmentosa-genetic disorder; tunnel vision preceded by night-blindnessen.wikipedia.org/wiki/Retinitis_pigmentosa; (H35.6) Retinal haemorrhage;(H35.7) Separation of retinal layersen.wikipedia.org/wiki/Retinal_haemorrhage (e.g., Central serousretinopathy en.wikipedia.org/wiki/Central_serous_retinopathy, Retinaldetachment: Detachment of retinal pigment epitheliumen.wikipedia.org/wiki/Retinal_detachment)en.wikipedia.org/wiki/Retina#Anatomy_of vertebrate_retina; (H35.8) Otherspecified retinal disorders; (H35.81) Macular edema-distorted centralvision, due to a swollen macula en.wikipedia.org/wiki/Macular_edema; and(H35.9) Retinal disorder, unspecified. See, World Health OrganizationICD-10 codes: Diseases of the eye and adnexa (H00-H59).

The images may be analyzed for various diseases concurrently. Forexample, the training data may encode signs of at least cotton woolspots (areas of retinal ischemia with edema); hard exudates (fattydeposits); Microaneurysms (appearing as small red dots on the retina);small flame hemorrhages from damaged blood vessel walls;neovascularization: signs of new blood vessel formation on the back ofthe eye; diabetes—also a cause of early cataracts, due to excess glucoseinterfering with the metabolism of the crystalline lens, and otherclinical signs; see, patient.info/doctor/eye-in-systemic-disease;www.ncbi.nlm.nih.gov/books/NBK221/.

More particularly, the invention comprises a system and method for theautomatic detection and localization of DR lesions in retinal fundusimages and the automatic DR classification of retinal fundus images. Thesystem recognizes DR based on training data representing classificationof regions of patient funduscopic images by experts.

In some embodiments, a computer-implemented method includes analyzing aretinal image and detecting areas of interest in said retinal image. Insome embodiments, the method further includes classifying a retinalimage based on the detection of the one or more areas of interest.

The present technology provides a new approach to the problem ofautomatically detecting retinopathy in retinal images. The idea is tocollect many thousands of retinal photos which have had all retinopathymarked by human experts, which are annotated by experts, wherein theannotations are validated as being reliable. The annotations take theform of regional assessment of medical classification, as well as imagequality. The image quality may also be determined, at least in part, byan unsupervised NN, since the image itself is governed by opticalprinciples. However, some aspects, may benefit from human expertise.

The annotations are used to train a NN, which may be a CNN or DNN.Because of the regional classification of the training data, the networkaccording to the present technology determines a classification ofrespective regions of a previously unclassified image. This is incontrast to technologies that classify unknown images as a whole. It istherefore apparent that the NN itself is fundamentally dependent on boththe labels and what is being labelled.

The training data employ may be the entire retinal image with regionalclassifications associated as labels, though this increases the degreesof freedom of the input data set, which may complicate the problemwithout corresponding improvement of network performance. That is,retinopathy in one region of the retina is very similar to retinopathyin other regions, and therefore the training need not beregion-specific. Of course, there may indeed be region-specificity, andthe present technology does not ignore or disclaim this, and ratherfinds that adequate performance is achievable while making theregion-independent presumption. For example, in a 1000×1000 input image,there are 10⁶ pixels, and without simplification, a NN would have tohave at least 10⁶ input neurons. On the other hand, 64×64 pixel regionsprovide sufficient precision to identify pathology, requiring at least4096 input neurons, about 250× fewer. Likewise, the region-independentpresumption may permit reduction in at least an entire hidden layer ofthe network which is designed to process regional differences.

Prior to processing, the image may be normalized with respect to color,brightness, etc., in a deterministic statistical or rule-based process.While this normalization may be achieved within the NN, this can also beachieved independent of the network, leading to a simpler implementationof the network.

In some cases, the regions may be normalized after segmentation, or aspart of the segmentation process. This permits segmentation to be basedon a statistical or rule based process that establishes regionalboundaries on a basis other than shape or space. Again, normalizationpreceding the NN simplifies the NN implementation, but also relieves theNN of learning the pre-normalized features. By employing expertdetermination of features that may be extracted from the image by anormalization process, the specificity of the NN for the features whichare not normalized is increased.

Each retinal image is highly individual, like a fingerprint, so thedataset will never contain an exact match to the whole image. However,the early disease processes targeted by this technology arecharacterized by being small and localized—a few pixels wide in ahigh-resolution image. In fact, the reason digital photos have onlyrecently been clinically accepted for screening is that the resolutionrequired to see the earliest signs of disease has only recently beeneasily available. So the dataset is actually a set of small patchestaken from real retinal photos, and the system according to the presenttechnology compares small patches from the new image to the patches inthe dataset.

At a high level, given a new image is processed as follows:

Accept raw image from camera, standardize size to 1024×1024 withcircular mask.

The NN is implemented to provide a classifier, which operates on theregions. Preferably, the same preprocessing that is applied to thelabelled dataset images is applied to unknown images, including thesegmentation into regions. The segmented regions, after normalization,are then classified based on the training data. The classifications arethen transferred back to the unknown image as a whole, to annotate theunknown image, which is then diagnosed or graded based on the full setof classifications.

The grading may be accompanied by a reliability, which may include anindication of type I (false positive) and/or type II (false negative)errors, and perhaps other types of errors. The grading reliability mayalso be responsive to original image quality, such as brightness, focus,contrast, color, etc. Where, for example, a portion of the image is ofpoor quality, the portion that is available for assessment may beanalyzed, with a grade and accompanying reliability that assesses theprobability of a missed disease-positive diagnosis, and a probability ofan erroneous disease-positive diagnosis.

Results can be optionally collated into an overall recommendation, ordetails can be supplied together with notes about particular areas ofconcern.

This technique requires collection of sufficient images, suitablyassessed by human experts, to form the dataset. The matching algorithmneeds to be sufficiently accurate and precise to avoid false positiveand false negative results.

Retinal camera models produce images with various resolutions, and thetypical retinal lesion will vary in size too. After significantexperimentation, reference to experts and UK NHS standardsdocumentation, it was concluded the smallest resolution which wouldretain sufficient features was 1024×1024 pixels, and the smallest patchin which a target lesion was still identifiable (on such an image) was32×32 pixels.

The images may be provided in conjunction with additional medical data,such as blood sugar, age, sex, weight, clinical history, etc., and thismay be used as part of the classification, though in some cases, it isbetter to classify the mage, and permit a clinician to integrate theother patient data into a diagnosis and/or prognosis. Indeed, to theextent that a clinician uses the output of the classifier, usingclinical data extrinsic to the image for classifying the image riskssignificant bias downstream where these same factors may be consideredagain.

The precise balance needed between type I and II errors, as outlinedabove, will depend on the intended use of the system. For a hospitalsetting minimizing type II error may be the most important, while forgeneral screening via opticians it might be most important to minimizetype I error and avoid causing needless worry to customers.

It is noted that the system need not be limited to full automateddiagnosis. Thus, if the Type II errors are minimized, false positivesmay then be rescreened by another method, such as human grader, or analternate or supplemental automated analysis, which, for example, mighttake more time and/or resources. For example, in cases where hash errorsare responsible for diminished performance quality, an automatedanalysis using less data reduction may be employed as a follow-up. Anysystem, no matter how inherently good or bad, can be “tuned” to strike adifferent trade-off between these two types of error.

The algorithm was found to be highly effective at marking regions of theretinal image with indications of disease, demonstrating that the imageprocessing/region matching technology was effective and efficient.

Widefield and ultra-widefield retinal images capture fields of view ofthe retina in a single image that are larger than 45-50 degreestypically captured in retinal fundus images. These images are obtainedeither by using special camera hardware or by creating a montage usingretinal images of different fields. The systems and methods describedherein can apply to widefield and ultra-widefield images.

Fluorescein angiography involves injection of a fluorescent tracer dyefollowed by an angiogram that measures the fluorescence emitted byilluminating the retina with light of wavelength 490 nanometers. Sincethe dye is present in the blood, fluorescein angiography imageshighlight the vascular structures and lesions in the retina. The systemsand methods described herein can apply to fluorescein angiographyimages.

It is therefore an object to provide a method of classifying vascularimages, comprising: receiving a vascular image; normalizing the vascularimage; segmenting the normalized vascular image into a plurality ofregions; automatically determining a vasculopathy vector for theplurality of regions with at least one classifier comprising a NN;automatically annotating each of the respective plurality of regions,based on the determined vasculopathy vectors; and automatically gradingthe received vascular image based on at least the annotations, whereinthe NN is trained based on at least an expert annotation of respectiveregions of vascular images, according to at least one objectiveclassification criterion.

It is also an object to provide a system for classifying vascularimages, comprising: an input configured to receive at least one vascularimage; a memory con figured to store information defining a NN trainedbased on at least an expert annotation of respective regions of aplurality of retinal images, according to at least one objectiveclassification criterion; at least one automated processor, configuredto: normalize the vascular image; segment the normalized vascular imageinto a plurality of regions; determine a vasculopathy vector for theplurality of regions with at least one classifier comprising the definedNN; annotate each of the respective plurality of regions, based on thedetermined vasculopathy vectors; and grade the received vascular imagebased on at least the annotations; and an output configured tocommunicate the grade.

It is a still further object to provide a computer readable medium,storing instructions for controlling at least one automated processor,comprising: instructions for normalizing a received vascular image;instructions for segmenting the normalized vascular image into aplurality of regions; instructions for determining a vasculopathy vectorfor the plurality of regions with at least one classifier comprising aNN; instructions for automatically annotating each of the respectiveplurality of regions, based on the determined vasculopathy vectors; andinstructions for automatically grading the received vascular image basedon at least the annotations, wherein the NN is trained based on at leastan expert annotation of respective regions of vascular images, accordingto at least one objective classification criterion.

A further object provides a method of classifying retinal images forpresence of retinopathy, comprising: receiving a funduscopic image;automatically normalizing the funduscopic image with respect to at leasta tricolor stimulus image space; automatically segmenting the normalizedvascular image into a plurality of regions based on at least onesegmentation rule; automatically determining a retinopathy vector forthe plurality of regions with at least one classifier comprising a NN,the NN being trained based on a training comprising at least a set offunduscopic images having regional expert retinopathy annotations;automatically annotating each of the respective plurality of regions,based on the determined vasculopathy vectors; automatically determininga grade of retinopathy of the received vascular image based on at leastthe annotations; and outputting the automatically determined grade ofretinopathy.

Another object provides a method of classifying eye images, comprisingreceiving and normalizing an eye image; segmenting the normalized eyeimage into a plurality of regions; automatically determining an eyedisease vector for the plurality of regions with a classifier comprisinga NN; automatically annotating each of the respective plurality ofregions, based on the determined eye disease vectors; and automaticallygrading the received eye image based on at least the annotations,wherein the NN is trained based on at least an expert annotation ofrespective regions of eye images, according to at least one objectiveclassification criterion.

A still further object provides a system for classifying eye images,comprising an input configured to receive at least one eye image; amemory configured to store information defining a NN trained based on atleast an expert annotation of respective regions of a plurality ofretinal images, according to at least one objective classificationcriterion; at least one automated processor, configured to: normalizethe eye image; segment the normalized eye image into a plurality ofregions; determine an eye disease vector for the plurality of regionswith at least one classifier comprising the defined NN; annotate each ofthe respective plurality of regions, based on the determined eye diseasevectors; and grade the received eye image based on at least theannotations; and an output configured to communicate the grade.

An object provides a computer readable medium, storing instructions forcontrolling an automated processor, for normalizing a received eyeimage; instructions for segmenting the normalized eye image into aplurality of regions; instructions for determining an eye disease vectorfor the plurality of regions with at least one classifier comprising aNN; instructions for automatically annotating each of the respectiveplurality of regions, based on the determined eye disease vectors; andinstructions for automatically grading the received eye image based onat least the annotations, wherein the NN is trained based on at least anexpert annotation of respective regions of eye images, according to atleast one objective classification criterion.

A further object provides a method of classifying images of pathology,comprising: receiving an image; normalizing the image; segmenting thenormalized image into a plurality of regions; automatically determininga disease vector for the plurality of regions with at least oneclassifier comprising a NN; automatically annotating each of therespective plurality of regions, based on the determined diseasevectors; and automatically grading the received image based on at leastthe annotations, wherein the NN is trained based on at least an expertannotation of respective regions of images, according to at least oneobjective classification criterion.

The NN may comprise a plurality of hidden layers, a DNN, RNN, and/or aCNN. The method may further comprise receiving a second vascular imageacquired at a time different from the vascular image; and outputtinginformation dependent on at least a change in the vascular image overtime.

The NN may be further trained based on at least an expert annotation ofa quality of respective regions of a retinal image.

The at least one classifier may comprise a multi-class support vectormachine classifier, or a Gradient Boosting Classifier, for example.

The at least one classifier may classify a respective region based onthe vasculopathy vectors of a plurality of respective regions, or aplurality of contiguous regions.

The classifier may comprise a classifier for determining a region havingindicia of DR.

The grading may comprise grading a degree of DR. The grade may beaccompanied by a probability of correctness of the grade. Theprobability of correctness may comprise a probability of a type I errorand a probability of a type II error.

The method may further comprise outputting an image representing theclassification of regions of the Vascular image.

The vascular images may comprise funduscopic images. The annotations maycomprise indications of DR within a respective region of a respectivefunduscopic image.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a flowchart according to an embodiment of the invention.

FIGS. 2A-2D show retinal images.

FIGS. 3 and 4 show a retinal image and round and square regions thereof,respectively.

FIGS. 5-23 chow various flowcharts according to respective embodimentsof the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

FIG. 1 illustrates an embodiment of the invention. Given a retinalfundus image I, block 100 performs image normalization to produce anormalized image I′. This normalization step involves identifying theretinal disc in I and resizing it to a fixed size, i.e., to a diameterof p pixels in an image of p×p pixels, where, typically, p=1024. This isillustrated in FIGS. 2A-2D, where FIGS. 2A and 2C show pre-normalizedretinal fundus images I and FIGS. 2B and 2D show the correspondingnormalized retinal fundus images I′. Automatic detection of the retinaldisc is a straightforward task which entails the detection of arelatively bright disc on a relatively dark and uniform background. Assuch, a number of well-known image processing techniques may be used,such as edge detection, circular Hough transform, etc. More details ofsuch techniques may be found in Russ, John C. The image processinghandbook. CRC press, 2016.

Next, in block 110 of FIG. 1, retinal regions are extracted fromnormalized image I′. This process entails extracting small retinalregions from the retinal disc. In some embodiments of the inventionthese regions may be fully contained in the retinal disk, while in otherembodiments of the invention these regions may be partially contained inthe retinal disk, for example containing at least 50% or 75% of retinalpixels. In some embodiments of the invention these regions may becircular and with a diameter of q pixels, where q=32 or q=64, while inother embodiments of the invention these regions may be square and withsize q×q pixels, where q=32 or q=64. In alternative embodiments of theinvention, a different region geometry may be used, for example ahexagon region.

More generally, a continuous region of pixels is defined, of arbitrarygeometry. Indeed, in some cases, the region may be topologicallydiscontiguous, or the analysis may be of the image information in adomain other than pixels.

FIG. 3 illustrates the extraction of two circular retinal regionsR_(α)=R(x, y) centered on (x, y)=(579,209) and R_(β)=R(x, y) centered on(x, y)=(603,226) from a normalized image I′, while FIG. 4 illustratesthe extraction of two square retinal regions R_(α)=R(x, y) centered on(x, y)=(757,327) and R_(β)=R(x, y) centered on (x, y)=(782,363) from anormalized image I′. In all cases, the regions are typically extractedin a regular pattern and with a significant degree of overlap; forexample starting at the top-most left-most region, extract successiveregions by moving with a stride of s pixels to the right up to thetop-most right-most region, then move s pixels down, extract successiveregions from the left-most to the right-most moving right by s pixels,and so on. Typically s takes a low value, i.e. typically between 1 and afew pixels. Different embodiments of the invention may use a highervalue for s and/or may extract non-overlapping regions. In all cases,block 110 produces a set of n retinal regions R₀, . . . , R_(n-1),collectively denoted by R and with R_(k)=R(x, y) denoting a regioncentered on (x, y) of normalized image I′.

We now consider the operation of block 120 of FIG. 1. Block 120 of FIG.1 analyzes all the retinal regions independently and for each regionR_(k) produces a DR lesion classification vector D_(k)=(D_(k) ⁰, . . . ,D_(k) ^(l-1)) where l is the number of classes that the classifier ofblock 120 has been trained on and D_(k) ^(c) is the class membershipprobability of region k for class c.

As a first example, in one embodiment of the invention, the classifierin block 120 may be trained to classify regions to l=2 classes, with c=0the “Healthy” class label and c=1 the “Diseased” class label.

As a second example, in a different embodiment of the invention, theclassifier in block 120 may be trained to classify regions to l>2classes, with c=0 the “Healthy” class label, c=1 the “Micro Aneurism”class label, c=2 the “Hard Exudate” class label, . . . , c=l−1 the“Laser Scar” class label. Thus, in such an embodiment, different typesof lesions, artifacts or scars that may be present in the retinal fundusimage are assigned separate class labels.

Thus, different embodiments of the invention may employ classifierstrained on different numbers of classes to achieve the highest possibleperformance in detecting DR lesions and/or identifying the types ofdetected DR lesions. In all cases, in alternative embodiments of theinvention, the classifier in block 120 may be configured to output D_(k)^(c) as a binary 0 or 1 value rather than a class membershipprobability, in which case D_(k) is not a class probability distributionbut a classification decision vector.

We now consider the architecture of block 120 of FIG. 1 in more detail.In one embodiment of the invention, block 120 of FIG. 1 comprises thearchitecture illustrated in FIG. 5. In block 500, retinal regions areoptionally normalized. This optional normalization may, for example,involve mean subtraction, resulting in the mean intensity of each colorchannel of each retinal region taking a value of 0, or some other meanadjustment, resulting in the mean intensity of each color channel ofeach retinal region taking a predefined specific value, or somegeometric normalization, for example rotation normalization, wherebyeach region is rotated so that its center of mass lies at a specificangle. Thus, for each retinal region R_(k), block 500 produces thenormalized region R′_(k). Then, in block 510 a CNN is used to classifyeach region R′_(k) and produce a classification vector D_(k). The CNNhas been previously trained using a large number of training samples,i.e., training labelled regions. As seen earlier, different embodimentsof the invention may employ CNNs trained on only two classes, forexample “Healthy” and “Diseased”, or more classes, for example“Healthy”, “Micro Aneurism”, “Hard Exudate” and so on, and, in allcases, the CNN may output D_(k) as a class probability distribution oras a classification decision vector.

FIG. 6 illustrates the architecture of block 510 of FIG. 5 in moredetail. As can be seen in FIG. 6, each region R_(k) is processedindependently to produce its classification vector D_(k). The CNN ofFIG. 6 is a common architecture, employing a number of convolutionlayers, which convolve their input with learned convolution masks,interleaved with pooling layers, i.e. spatial subsampling layers,followed by a number of fully connected layers to produce the DR lesionclassification vector D_(k) for region R_(k). The theory of CNNs(en.wikipedia.org/wiki/Convolutional_neural_network) is not covered herebut there are numerous publications on the topic, for exampleKrizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenetclassification with deep convolutional neural networks.” Advances inneural information processing systems. 2012.

In another embodiment of the invention, block 120 of FIG. 1 comprisesthe architecture illustrated in FIG. 7. Block 700 is optional andoperates in the same fashion as block 500 of FIG. 5. Thus, for eachretinal region R_(k), block 700 produces the normalized region R′_(k).Then, in block 710 a CNN is used to produce a high-dimensional vector ofdistinguishing features V_(k) for each region R′_(k).

FIG. 8 illustrates the architecture of block 710. As can be seen in FIG.8, the CNN used to produce the feature vector V_(k) is the same as theCNN of FIG. 6 with V_(k) taken at the output of one of the layerspreceding the final fully connected classification layer. Then, back toFIG. 7, in block 720 a classifier is used to classify each featurevector V_(k) and produce a classification vector D_(k). There arevarious options for the classifier of block 720, such as a binary ormulti-class Support Vector Machine (SVM) or Gradient Boosting Classifier(GBC), previously trained on the feature vectors of a set of traininglabelled regions. As seen earlier, different embodiments of theinvention may employ classifiers trained on two or more classes and mayoutput D_(k) as a class probability distribution or as a classificationdecision vector. The theory of SVMs(en.wikipedia.org/wiki/Support_vector_machine) and GBCs(en.wikipedia.org/wiki/Gradient_boosting) are not covered here but thereare numerous publications on the topics, for example Burges, ChristopherJ C. “A tutorial on support vector machines for pattern recognition.”Data mining and knowledge discovery 2.2 (1998): 121-167 and Schapire,Robert E., and Yoav Freund. Boosting: Foundations and algorithms. MITpress, 2012.

In another embodiment of the invention, block 120 of FIG. 1 comprisesthe architecture illustrated in FIG. 9. Block 900 is optional andoperates in the same fashion as block 500 of FIG. 5 or block 700 of FIG.7. Thus, for each retinal region R_(k), block 900 produces thenormalized region R′_(k). Then, in block 910 a dimensionality reductiontechnique, such as Linear Discriminant Analysis (LDAen.wikipedia.org/wiki/Linear_discriminant_analysis) or PrincipalComponents Analysis (PCAen.wikipedia.org/wiki/Principal_component_analysis), is used to producea reduced dimensionality representation of each region R′_(k) which isused as a feature vector V_(k) for that region. Then, in block 920 aclassifier is used to classify each feature vector V_(k) and produce aclassification vector D_(k). As in the previous embodiment, binary ormulti-class SVM or Gradient Boosting Classifier, previously trained onthe feature vectors of a set of training labelled regions, are typicallygood choices for the classifier. As seen earlier, different embodimentsof the invention may employ classifiers trained on two or more classesand may output D_(k) as a class probability distribution or as aclassification decision vector.

We now consider the operation of block 130 of FIG. 1. Block 130 of FIG.1 analyzes all the retinal regions independently and for each regionR_(k) produces a quality classification vector Q_(k)=(Q_(k) ⁰, . . . ,C_(k) ^(m-1)) where m is the number of classes that the classifier ofblock 130 has been trained on and Q_(k) ^(z) is the class membershipprobability of region k for class z. As a first example, in oneembodiment of the invention, the classifier in block 130 may be trainedto classify regions to m=2 classes, with z=0 the “Good Quality” classlabel and z=1 the “Bad Quality” class label. As a second example, in adifferent embodiment of the invention, the classifier in block 130 maybe trained to classify regions to m>2 classes, with z=0 the “GoodQuality” class label, z=1 the “Bad Illumination” class label, z=2 the“Bad Contrast” class label, . . . , z=m−1 the “Lens Flare” class label.

Thus, in such an embodiment, different types of quality artifacts andimaging artifacts that may be present in the retinal fundus image areassigned separate class labels. Various embodiments of the invention mayemploy classifiers trained on different numbers of classes to achievethe highest possible performance in assessing the quality of retinalregions. It should be noted that choice of the number of classes for theregion quality classifier is independent from the choice of the numberof classes for the DR lesion classifier. In all cases, in alternativeembodiments of the invention, the classifier in block 130 may beconfigured to output Q_(k) ^(z) as a binary 0 or 1 value rather than aclass membership probability, in which case Q_(k) is not a classprobability distribution but a classification decision vector.

We now consider the architecture of block 130 of FIG. 1 in more detail.In one embodiment of the invention, block 130 of FIG. 1 comprises thearchitecture illustrated in FIG. 10. In block 1000, retinal regions areoptionally normalized. This optional normalization may, for example,involve intensity or geometric normalizations. Thus, for each retinalregion R_(k), block 1000 produces the normalized region R′_(k). Then, inblock 1010 a CNN is used to classify each region R′_(k) according to itsquality and produce a quality classification vector Q_(k). The CNN hasbeen previously trained using a large number of training samples, i.e.,training labelled regions. As seen earlier, different embodiments of theinvention may employ CNNs trained on only two classes, for example “GoodQuality” and “Bad Quality”, or more classes, for example “Good Quality”,“Bad Illumination”, “Bad Contrast” and so on, and, in all cases, the CNNmay output Q_(k) as a class probability distribution or as aclassification decision vector.

FIG. 11 illustrates the internal architecture of the CNN of block 1010;this is substantially the same as the CNN architecture of FIG. 6described earlier, although network parameters such as the size, strideand number of filters, the learned convolution masks, the number orconvolution layers, the number of fully connected layers and so on maybe different.

In another embodiment of the invention, block 130 of FIG. 1 comprisesthe architecture illustrated in FIG. 12. Block 1200 is optional andoperates in the same fashion as block 1000 of FIG. 10. Thus, for eachretinal region R_(k), block 1200 produces the normalized region R′_(k).Then, in block 1210 a CNN is used to produce a high-dimensional vectorof distinguishing features V_(k) for each region R′_(k).

FIG. 13 illustrates the internal architecture of this CNN. As can beseen in FIG. 13, the CNN used to produce the feature vector V_(k) is thesame as the CNN of FIG. 11 with V_(k) taken at the output of one of thelayers preceding the final fully connected classification layer. Then,back to FIG. 12, in block 1220 a classifier is used to classify eachfeature vector V_(k) and produce a classification vector Q_(k). Thisclassifier may, for example, be a binary or multi-class SVM or GBC,previously trained on the feature vectors of a set of training labelledregions. As seen earlier, different embodiments of the invention mayemploy classifiers trained on two or more classes and may output Q_(k)as a class probability distribution or as a classification decisionvector.

In another embodiment of the invention, block 130 of FIG. 1 comprisesthe architecture illustrated in FIG. 14. As can be seen in FIG. 14,different techniques are employed to assess the quality of each regionR_(k) according to different criteria, including but not limited toillumination, contrast, blur, etc. Various well known image processingtechniques may be employed to perform this region quality analysis. Theillumination quality metric of block 1400 may, for example, becalculated as min(abs((τ_(H)+τ_(L)−2f)/(τ_(H)−τ_(L))), 1)∈[0,1] where fdenotes the average region intensity and τ_(L) and τ_(H) the lowest andhighest acceptable intensity for a region, respectively. The contrastquality metric of block 1410 may be calculated using a suitabletechnique, for example one of the techniques described in Tripathi,Abhishek Kumar, Sudipta Mukhopadhyay, and Ashis Kumar Dhara.“Performance metrics for image contrast.” Image Information Processing(ICIIP), 2011 International Conference on. IEEE, 2011, normalized to[0,1]. Similarly, the blur quality metric of block 1420 may becalculated using a suitable technique, for example the techniquedescribed in Marziliano, Pina, et al. “A no-reference perceptual blurmetric.” Image Processing. 2002. Proceedings. 2002 InternationalConference on. Vol. 3. IEEE, 2002, normalized to [0,1].

Finally, block 1430 computes a “Good Quality” metric based on theillumination, contrast, blur, etc. metrics, for example as Q_(k)⁰=1−max(Q_(k) ¹, Q_(k) ², . . . , Q_(k) ^(j-1)) and produces the finalquality assessment vector Q_(k) for each region R_(k).

We now consider the operation of block 140 of FIG. 1. Block 140 of FIG.1 analyzes the DR lesion classification vectors D_(k) and qualityclassification vectors Q_(k) and produces conditioned DR lesionclassification vectors D″_(k). This operation entails three main steps,namely: (1) DR lesion classification vector spatial probabilityadjustment. This step is optional. Each DR lesion classification vectorD_(k) corresponds to a retinal region R_(k) and, as seen earlier, eachR_(k)=R(x, y) is retinal region centered on pixel (x, y) of normalizedimage I′. Thus, for each DR lesion classification vector D_(k)corresponding to a retinal region R_(k) centered on pixel (x, y) of I′it is possible to adjust the values of D_(k) based on the DR lesionclassification vectors of its surrounding regions, for example thosecentered within a distance d of (x, y). This adjustment may, forexample, take the form of mean filtering, median filtering, etc. and isbeneficial in creating more robust and reliable DR lesion classificationvectors for a retinal fundus image. (2) Quality classification vectorprobability adjustment. This step is optional. The principle of thisidentical to step (1) above, but the operation are performed on thequality classification vectors Q. (3) DR lesion classification vectoradjustment based on quality classification vectors.

We now consider the architecture of block 140 of FIG. 1 in more detail.FIG. 15 illustrates the architecture of block 140 of FIG. 1. Block 1500implements DR lesion classification vector spatial probabilityadjustment. That is, for each DR lesion classification vector D_(k)corresponding to a retinal region R_(k) centered on pixel (x, y) of I′,the values of D_(k) are adjusted based on the DR lesion classificationvectors of its surrounding regions centered within a distance d of (x,y). This adjustment may, for example, take the form of mean, median,min, max or other filtering, to produce adjusted DR lesionclassification vector D′_(k). An alternative way of viewing this is asfollows: For image I′, l spatial probability maps may be produced, wherel is the number of DR classification labels in D. For example, for l=2classes, with c=0 the “Healthy” class label and c=1 the “Diseased” classlabel, a “Healthy” probability map ^(DR)P⁰ and a “Diseased” probabilitymap ^(DR) P¹ may be produced. Then, for each DR lesion classificationvector D_(k) corresponding to a retinal region R_(k) centered on pixel(x, y) of I′, ^(DR)P^(c) (x/s,y/s)=D_(k) ^(c) where s is the samplingstride used in region extraction. Each ^(DR)P^(c) may then be spatiallyprocessed, e.g. with a mean, median, min, max or other filter, or othermorphological operations, such as dilation, erosion, etc. to produceadjusted probability maps ^(DR)P′^(c), which may then be mapped toadjusted DR lesion classification vectors D′_(k). In a similar fashionto block 1500, block 1510 implements quality classification vectorspatial probability adjustment. Finally, block 1520 of FIG. 15 performsDR lesion classification vector adjustment based on qualityclassification vectors. There are various possibilities in how thisadjustment may be performed.

As one example, for each D′_(k) and for each class label c out of l DRclasses, this adjustment may be performed as

$D_{k}^{{\prime\prime}\; c} = \left\{ \begin{matrix}D_{k}^{\prime c} & {{{if}\mspace{14mu} Q_{k}^{\prime \; 0}} \geq 0.5} \\0 & {{{if}\mspace{14mu} Q_{k}^{\prime 0}} < 0.5}\end{matrix} \right.$

where Q′_(k) ⁰ denotes the “Good Quality” class probability for regionR_(k). In essence, with the above relation, if the “Good Quality” classprobability for region R_(k) is less than 0.5, all DR classificationvector probabilities for region R_(k) are reduced to 0, i.e., the systemdoes not deliver any DR classification probabilities for that regionbecause the quality is too poor.

As another example, for each D′_(k) and for each class label c out of lDR classes, this adjustment may be performed as

$D_{k}^{{\prime\prime}\; c} = \left\{ \begin{matrix}D_{k}^{\prime \; c} & {{{if}\mspace{14mu} Q_{k}^{\prime 0}} \geq 0.5} \\{\left( {Q_{k}^{\prime 0} + {{0.2}5}} \right)D_{k}^{\prime c}} & {{{if}\mspace{14mu} 0.5} > Q_{k}^{\prime 0} \geq 0.25} \\0 & {{{if}\mspace{14mu} Q_{k}^{\prime 0}} < 0.5}\end{matrix} \right.$

where Q′_(k) ⁰ denotes the “Good Quality” class probability for regionR_(k). With this relation, the drop off in the DR classificationprobabilities down to 0 due to poor quality is more gradual.

In the examples above, each D′_(k) is adjusted based on thecorresponding Q′_(k). In alternative embodiments of the invention, eachD′_(k) corresponding to a retinal region R_(k) centered on pixel (x, y)of I′ may be adjusted based on the corresponding Q′_(k) as well as thequality classification vectors of surrounding regions, for example thosecentered within a distance d of (x, y).

We now consider the operation of block 150 of FIG. 1. Block 150 of FIG.1 analyzes the DR lesion classification vectors D″ and produces a finalretinal fundus image-level DR classification vector C=(C°, . . . ,C″^(w-1)) where w is the number of classes and C^(v) is the classmembership probability for class v. In one embodiment of the invention,w=2 with v=0 the “Healthy” class label and v=1 the “Diseased” classlabel. As a second example, in a different embodiment of the invention,w=4 classes, with each class corresponding to DR medical grade. In allcases, in alternative embodiments of the invention, C^(v) may take abinary 0 or 1 value, in which case C is not a class probabilitydistribution but a classification decision vector.

We now consider the architecture of block 150 of FIG. 1 in more detail.In one embodiment of the invention, block 150 of FIG. 1 comprises thearchitecture of FIG. 16. In block 1600 of FIG. 16, the DR lesionclassification vector probabilities are thresholded, for example as

$D_{k}^{\prime \prime \prime c} = \left\{ \begin{matrix}1 & {{{if}\mspace{14mu} D_{k}^{{\prime\prime}\; c}} \geq \tau_{D}^{c}} \\0 & {{{if}\mspace{14mu} D_{k}^{{\prime\prime}\; c}} < \tau_{D}^{c}}\end{matrix} \right.$

Then, in block 1610, the D′″ undergoes spatial processing to produce D″.This block operates in substantially the same fashion as block 1500 ofFIG. 15, performing operations such as median filtering, erosion,dilation, and so on. In alternative embodiments of the invention, block1610 may be skipped, while in different embodiments of the invention theorder of blocks 1600 and 1610 may be reversed. Then, in block 1620, thenumber of regions for each class label l in D″ is counted. As seenearlier, in some embodiments of the invention, l=2 classes, with c=0 the“Healthy” class label and c=1 the “Diseased” class label. In differentembodiments of the invention, l>2 classes, with c=0 the “Healthy” classlabel, c=1 the “Micro Aneurism” class label, c=2 the “Hard Exudate”class label, . . . , c=l−1 the “Laser Scar” class label. Thus, block1620 produces a vector G=(G⁰, . . . , G^(l-1)) where each element G^(c)is a region count for the corresponding label c. Then, based on theregion count vector G, block 1630 produces a final retinal fundus imageclassification decision vector C=(C⁰, . . . , C^(w-1)) where w is thenumber of classes and C^(v) is the decision for class v. As seenearlier, in one embodiment of the invention, w=2 with v=0 the “Healthy”class label and v=1 the “Diseased” class label. In a differentembodiment of the invention, w=4 classes, with each class correspondingto DR medical grade, which can be established based on the numbers ofdifferent types of DR lesions in the retinal fundus image.

In another embodiment of the invention, block 150 of FIG. 1 comprisesthe architecture of FIG. 17. In block 1700 of FIG. 17, the DR lesionclassification vectors D″ are converted into l 2D probability maps,where l is the number of DR classification labels in D″. The process ofconverting a vector of DR classification vectors into 2D probabilitymaps is substantially the same as described earlier for block 1500 ofFIG. 15. In creating the 2D probability maps, block 1700 may optionally(i) change the dynamic range of the probabilities, for example from areal range of [0,1] to an integer range [0,255], and (ii) subsample the2D probability maps to fixed resolution of t×t pixels, for example t=256pixels. Thus, block 1700 generates l 2D probability maps P=(P⁰, . . . ,P^(l-1)). Then in block 1710 a CNN is used to classify the probabilitymaps P and produce a final retinal fundus image-level DR classificationvector C=(C⁰, . . . , C^(w-1)) where w is the number of classes andC^(v) is the class membership probability for class v. The CNN has beenpreviously trained using a large number of training samples. As seenearlier, different embodiments of the invention may employ CNNs trainedon different number of classes, for example w=2 with v=0 the “Healthy”class label and v=1 the “Diseased” class label, or w=4 classes, witheach class corresponding to DR medical grade. In all cases, inalternative embodiments of the invention, C^(v) may take a binary 0 or 1value, in which case C is not a class probability distribution but aclassification decision vector. FIG. 18 illustrates the internalarchitecture of the CNN of block 1710; this is substantially the same asthe CNN architectures of FIG. 6 and FIG. 11 described earlier, althoughnetwork parameters such as the size, stride and number of filters, thelearned convolution masks, the number or convolution layers, the numberof fully connected layers and so on may be different.

In another embodiment of the invention, block 150 of FIG. 1 comprisesthe architecture illustrated in FIG. 19. Block 1900 operates in the samefashion as block 1700 of FIG. 17. Then, in block 1910 a CNN is used toproduce a high-dimensional vector of distinguishing features V. FIG. 20illustrates the internal architecture illustrates the internalarchitecture of this CNN. As can be seen in FIG. 20, the CNN used toproduce the feature vector V is the same as the CNN of FIG. 18 with Vtaken at the output of one of the layers preceding the final fullyconnected classification layer. Then, back to FIG. 19, in block 1920 aclassifier is used to classify the feature vector V and produce theclassification vector C. This classifier may, for example, be a binaryor multi-class SVM or Gradient Boosting Classifier, previously trainedon the feature vectors of a set of training samples. As seen earlier,different embodiments of the invention may employ classifiers trained ontwo or more classes and may output C as a class probability distributionor as a classification decision vector.

FIG. 21 illustrates an alternative embodiment of the invention. There,blocks 2100 and 2110 operate in substantially the same fashion as blocks100 and 110 of FIG. 1, performing image normalization and regionextraction.

We now consider the operation of block 2120 of FIG. 21. Block 2120 ofFIG. 21 analyzes all the retinal regions independently and for eachregion R_(k) produces a joint DR lesion/quality classification vectorA_(k)=(D_(k), Q_(k))=(D_(k) ⁰, . . . , D_(k) ^(l-1), Q_(k) ⁰, . . . ,Q_(k) ^(m-1)) where/is the number of DR lesion classes as describedpreviously and m is the number of region quality classes as describedpreviously. Thus, in the training of the classifier of block 2120, eachtraining sample is assigned both a DR lesion class label and a qualityclass label, allowing the classifier to produce a joint DRlesion/quality classification vector A_(k)=(D_(k), Q_(k))=(D_(k) ⁰, . .. , D_(k) ^(l-1), Q_(k) ⁰, . . . , Q_(k) ^(m-1)) for each region R_(k).As seen previously, the classifier in block 2120 may be configured tooutput binary 0 or 1 value rather than class membership probabilities.

We now consider the architecture of block 2120 of FIG. 21 in moredetail. Block 2120 of FIG. 21 comprises the architecture illustrated inFIG. 22. Block 2200 performs retinal region normalization insubstantially the same fashion as block 500 or FIG. 5 and block 700 orFIG. 7. Thus, for each retinal region R_(k), block 2200 produces thenormalized region R′_(k). Then, in block 2210 a CNN is used to classifyeach region R′_(k) and produce the classification vector A_(k). FIG. 23illustrates the architecture of block 2210 of FIG. 22 in more detail.This is substantially the same as the CNN architecture of FIG. 6 andFIG. 11 described earlier, although network parameters such as the size,stride and number of filters, the learned convolution masks, the numberor convolution layers, the number of fully connected layers and so onmay be different.

Then, back to FIG. 21, block 2130 performs regional DR lesionprobability conditioning in substantially the same fashion as block 140of FIG. 1, and block 2140 performs retinal fundus image classificationin substantially the same fashion as block 150 of FIG. 1.

In some embodiments, the process of imaging is performed by a computingsystem. In some embodiments, the computing system includes one or morecomputing devices, for example, a personal computer that is IBM,Macintosh, Microsoft Windows or Linux/Unix compatible or a server orworkstation. In one embodiment, the computing device comprises a server,a laptop computer, a smart phone, a personal digital assistant, a kiosk,or a media player, for example. In one embodiment, the computing deviceincludes one or more CPUS, which may each include a conventional orproprietary microprocessor. The computing device further includes one ormore memory, such as random access memory (“RAM”) for temporary storageof information, one or more read only memory (“ROM”) for permanentstorage of information, and one or more mass storage device, such as ahard drive, diskette, solid state drive, or optical media storagedevice. Typically, the modules of the computing device are connected tothe computer using a standard based bus system. In differentembodiments, the standard based bus system could be implemented inPeripheral Component Interconnect (PCI), Microchannel, Small ComputerSystem Interface (SCSI), Industrial Standard Architecture (ISA) andExtended ISA (EISA) architectures, for example. In addition, thefunctionality provided for in the components and modules of computingdevice may be combined into fewer components and modules or furtherseparated into additional components and modules.

The computing device is generally controlled and coordinated byoperating system software, such as Windows XP, Windows Vista, Windows 7,Windows 8, Windows Server, Embedded Windows, Unix, Linux, Ubuntu Linux,SunOS, Solaris, iOS, Blackberry OS, Android, or other compatibleoperating systems. In Macintosh systems, the operating system may be anyavailable operating system, such as MAC OS X. In other embodiments, thecomputing device may be controlled by a proprietary operating system.Conventional operating systems control and schedule computer processesfor execution, perform memory management, provide file system,networking, I/O services, and provide a user interface, such as agraphical user interface (GUI), among other things.

The exemplary computing device may include one or more commonlyavailable I/O interfaces and devices, such as a keyboard, mouse,touchpad, touchscreen, and printer. In one embodiment, the I/Ointerfaces and devices include one or more display devices, such as amonitor or a touchscreen monitor, that allows the visual presentation ofdata to a user. More particularly, a display device provides for thepresentation of GUIs, application software data, and multimediapresentations, for example. The computing device may also include one ormore multimedia devices, such as cameras, speakers, video cards,graphics accelerators, and microphones, for example. The I/O interfacesand devices provide a communication interface to various externaldevices. The computing device is electronically coupled to a network,which comprises one or more of a LAN, WAN, and/or the Internet, forexample, via a wired, wireless, or combination of wired and wireless,communication link. The network communicates with various computingdevices and/or other electronic devices via wired or wirelesscommunication links.

Images to be processed according to methods and systems describedherein, may be provided to the computing system over the network fromone or more data sources. The data sources may include one or moreinternal and/or external databases, data sources, and physical datastores. The data sources may include databases storing data to beprocessed with the imaging system according to the systems and methodsdescribed above, or the data sources may include databases for storingdata that has been processed with the imaging system according to thesystems and methods described above. In some embodiments, one or more ofthe databases or data sources may be implemented using a relationaldatabase, such as Sybase, Oracle, CodeBase, MySQL, SQLite, andMicrosoft® SQL Server, as well as other types of databases such as, forexample, a flat file database, an entity-relationship database, arelational database, and object-oriented database, NoSQL database,and/or a record-based database.

The computing system includes an imaging system module that may bestored in the mass storage device as executable software codes that areexecuted by the CPU. These modules may include, by way of example,components, such as software components, object-oriented softwarecomponents, class components and task components, processes, functions,attributes, procedures, subroutines, segments of program code, drivers,firmware, microcode, circuitry, data, databases, data structures,tables, arrays, and variables. The computing system is configured toexecute the imaging system module in order to perform, for example,automated low-level image processing, automated image registration,automated image assessment, automated screening, and/or to implement newarchitectures described above.

In general, the word “module,” as used herein, refers to logic embodiedin hardware or firmware, or to a collection of software instructions,possibly having entry and exit points, written in a programminglanguage, such as, for example, Python, Java, Lua, C and/or C++.Software modules may be provided on a computer readable medium, such asa optical storage medium, flash drive, or any other tangible medium.Such software code may be stored, partially or fully, on a memory deviceof the executing computing device, such as the computing system, forexecution by the computing device. Each of the processes, methods, andalgorithms described in the preceding sections may be embodied in, andfully or partially automated by, code modules executed by one or morecomputer systems or computer processors comprising computer hardware.The code modules may be stored on any type of non-transitorycomputer-readable medium or computer storage device, such as harddrives, solid state memory, optical disc, and/or the like. The systemsand modules may also be transmitted as generated data signals (forexample, as part of a carrier wave or other analog or digital propagatedsignal) on a variety of computer-readable transmission mediums,including wireless-based and wired/cable-based mediums, and may take avariety of forms (for example, as part of a single or multiplexed analogsignal, or as multiple discrete digital packets or frames). Theprocesses and algorithms may be implemented partially or wholly inapplication-specific circuitry. The results of the disclosed processesand process steps may be stored, persistently or otherwise, in any typeof non-transitory computer storage such as, for example, volatile ornon-volatile storage.

The various features and processes described above may be usedindependently of one another, or may be combined in various ways. Allpossible combinations and subcombinations are intended to fall withinthe scope of this disclosure. In addition, certain method or processblocks may be omitted in some implementations. The methods and processesdescribed herein are also not limited to any particular sequence, andthe blocks or states relating thereto can be performed in othersequences that are appropriate. For example, described blocks or statesmay be performed in an order other than that specifically disclosed, ormultiple blocks or states may be combined in a single block or state.The example blocks or states may be performed in serial, in parallel, orin some other manner. Blocks or states may be added to or removed fromthe disclosed example embodiments. The example systems and componentsdescribed herein may be configured differently than described. Forexample, elements may be added to, removed from, or rearranged comparedto the disclosed example embodiments.

Conditional language, such as, among others, “can,” “could,” “might,” or“may,” unless specifically stated otherwise, or otherwise understoodwithin the context as used, is generally intended to convey that certainembodiments include, while other embodiments do not include, certainfeatures, elements and/or steps. Thus, such conditional language is notgenerally intended to imply that features, elements and/or steps are inany way required for one or more embodiments or that one or moreembodiments necessarily include logic for deciding, with or without userinput or prompting, whether these features, elements and/or steps areincluded or are to be performed in any particular embodiment. The term“including” means “included but not limited to.” The term “or” means“and/or”.

Any process descriptions, elements, or blocks in the flow or blockdiagrams described herein and/or depicted in the attached figures shouldbe understood as potentially representing modules, segments, or portionsof code which include one or more executable instructions forimplementing specific logical functions or steps in the process.Alternate implementations are included within the scope of theembodiments described herein in which elements or functions may bedeleted, executed out of order from that shown or discussed, includingsubstantially concurrently or in reverse order, depending on thefunctionality involved, as would be understood by those skilled in theart.

All of the methods and processes described above may be embodied in, andpartially or fully automated via, software code modules executed by oneor more general purpose computers. For example, the methods describedherein may be performed by the computing system and/or any othersuitable computing device. The methods may be executed on the computingdevices in response to execution of software instructions or otherexecutable code read from a tangible computer readable medium. Atangible computer readable medium is a data storage device that canstore data that is readable by a computer system. Examples of computerreadable mediums include read-only memory, random-access memory, othervolatile or non-volatile memory devices, CD-ROMs, magnetic tape, flashdrives, and optical data storage devices.

It should be emphasized that many variations and modifications may bemade to the above-described embodiments, the elements of which are to beunderstood as being among other acceptable examples. All suchmodifications and variations are intended to be included herein withinthe scope of this disclosure. The foregoing description details certainembodiments. It will be appreciated, however, that no matter howdetailed the foregoing appears in text, the systems and methods can bepracticed in many ways. For example, a feature of one embodiment may beused with a feature in a different embodiment. As is also stated above,it should be noted that the use of particular terminology whendescribing certain features or aspects of the systems and methods shouldnot be taken to imply that the terminology is being re-defined herein tobe restricted to including any specific characteristics of the featuresor aspects of the systems and methods with which that terminology isassociated.

What is claimed is:
 1. A method of classifying a funduscopic image, comprising: normalizing the funduscopic image; segmenting the funduscopic image into a plurality of regular regions; automatically determining a quality for each respective regular region, representing at least one quantitative quality characteristic of the regular region; automatically determining a probability of retinopathy for each respective regular region; automatically classifying the funduscopic image with respect to a presence of retinopathy; and outputting an indication of the presence of retinopathy in the funduscopic image.
 2. The method according to claim 1, wherein the automatic determination of the probability of retinopathy for each respective regular region is determined by a neural network.
 3. The method according to claim 2, wherein the neural network is a deep neural network trained based on expert annotation of regions of funduscopic images for retinopathy.
 4. The method according to claim 1, wherein the automatically determined quality for each respective regular region is a vector determined by a neural network.
 5. The method according to claim 1, wherein the automatically determined quality for each respective regular region comprises an assessment of at least one of a brightness, a focus, lens flare, a contrast, and a color.
 6. The method according to claim 1, wherein the automatically determined quality for each respective regular region comprises an assessment of at least two of a brightness, a focus, a lens flare, a contrast, and a color.
 7. The method according to claim 1, wherein the regions of poor quality do not contribute to a classification probability of the funduscopic image.
 8. The method according to claim 1, wherein a classification probability of a respective regular region of the funduscopic image is dependent on a multivalued vector of the quality of the regular region and the probability of retinopathy for each respective regular region.
 9. The method according to claim 1, wherein the probability of retinopathy for each respective regular region is determined by at least one of a multi-class support vector machine classifier, and a Gradient Boosting Classifier.
 10. The method according to claim 1, wherein said outputting the indication of the presence of retinopathy in the funduscopic image comprises outputting a 2D probability map.
 11. The method according to claim 1, said outputting the indication of the presence of retinopathy in the funduscopic image comprises outputting a degree of diabetic retinopathy with respect to at least three different grades.
 12. A method of classifying a funduscopic image, comprising: normalizing and segmenting a funduscopic image into a plurality of regular regions; automatically determining a probability of retinopathy for each respective regular region; generating a 2D probability map for a probability of presence of indicia of retinopathy in respective regular regions, each having dynamic range of probability; and classifying the 2D probability map with respect to a presence of retinopathy in the funduscopic image.
 13. The method according to claim 12, wherein the automatic determination of the probability of retinopathy for each respective regular region is determined by a deep neural network, trained based on expert annotation of regions of funduscopic images for retinopathy.
 14. The method according to claim 12, further comprising automatically determining a multivalued, multiparameter quality vector for each respective regular region.
 15. The method according to claim 14, wherein the automatically determined quality vector for each respective regular region comprises an assessment of at least one of a brightness, a focus, lens flare, a contrast, and a color.
 16. The method according to claim 12, wherein a classification probability of a respective regular region of the funduscopic image is dependent on a multivalued vector of the quality of the regular region and the probability of retinopathy for each respective regular region.
 17. The method according to claim 12, wherein the probability of retinopathy for each respective regular region is determined by at least one of a multi-class support vector machine classifier, and a Gradient Boosting Classifier.
 18. A method for classifying diabetic retinopathy in a funduscopic image, comprising: converting a plurality of diabetic retinopathy lesion classification vectors for respective regions of a funduscopic image into a plurality of 2D probability maps, wherein a number of 2D probability maps corresponds to a number of diabetic retinopathy classification labels; and automatically classifying the plurality of 2D probability maps with a convolutional neural network, to produce a retinal funduscopic image-level diabetic retinopathy classification vector, wherein the convolutional neural network is trained with a plurality of funduscopic images expert annotated for diabetic retinopathy.
 19. The method according to claim 18, further comprising automatically determining a quality for each respective region, representing at least one quantitative quality characteristic of the regular region, wherein an application of a respective region in the retinal funduscopic image-level diabetic retinopathy classification vector is dependent on a corresponding quality for the respective region.
 20. The method according to claim 18, further comprising: normalizing the funduscopic image; segmenting the funduscopic image into a plurality of regular regions; automatically determining a quality for each respective regular region, representing at least one quantitative quality characteristic of the regular region; and outputting the retinal funduscopic image-level diabetic retinopathy classification vector. 